Gender Classification of Weblog Authors

نویسندگان

  • Xiang Yan
  • Ling Yan
چکیده

In this paper, we present a Naı̈ve Bayes classification approach to identify genders of weblog authors. In addition to features employed in traditional text categorization, we use weblog-specific features such as web page background colors and emoticons. Our results in progress, although preliminary, outperform the chosen baseline. They also suggest room for significant improvement once more advanced functionalities of the classifier are implemented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Writing from Experience Presentations of Gender Identity on Weblogs

This article examines how weblog authors present their online gender identity, in order to establish how these modes of presentation fit into the research landscape about gender identity and computer-mediated communication (CMC). After a preliminary descriptive analysis of a sample of Dutch and Flemish weblogs, the authors conduct a qualitative content analysis of four of these ‘blogs’. They co...

متن کامل

Investment and Attention in the Weblog Community

While the weblog medium has grown out of a few modest technological innovations, the social and behavioral aspects of this emerging practice represent a large shift towards a new form of interaction: a massively distributed but completely connected conversation covering every imaginable topic of interest. This paper seeks to understand the social implications of hypertext links within the commu...

متن کامل

Distinguishing Affective States in Weblog Posts

This short paper reports on initial experiments on the use of binary classifiers to distinguish affective states in weblog posts. Using a corpus of English weblog posts, annotated for mood by their authors, we trained support vector machine binary classifiers, and show that a typology of affective states proposed by Scherer’s et al is a good starting point for more

متن کامل

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

Whose Thumb Is It Anyway? Classifying Author Personality from Weblog Text

We report initial results on the relatively novel task of automatic classification of author personality. Using a corpus of personal weblogs, or ‘blogs’, we investigate the accuracy that can be achieved when classifying authors on four important personality traits. We explore both binary and multiple classification, using differing sets of n-gram features. Results are promising for all four tra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006